Picture for Weida Wang

Weida Wang

OmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science Subfields

Add code
May 28, 2026
Viaarxiv icon

$δ$-mem: Efficient Online Memory for Large Language Models

Add code
May 12, 2026
Viaarxiv icon

MolViBench: Evaluating LLMs on Molecular Vibe Coding

Add code
May 05, 2026
Viaarxiv icon

PolyReal: A Benchmark for Real-World Polymer Science Workflows

Add code
Apr 03, 2026
Viaarxiv icon

InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery

Add code
Feb 09, 2026
Viaarxiv icon

PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

Add code
Feb 06, 2026
Viaarxiv icon

SecureSplit: Mitigating Backdoor Attacks in Split Learning

Add code
Jan 20, 2026
Viaarxiv icon

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

Add code
Dec 30, 2025
Viaarxiv icon

Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning

Add code
Oct 02, 2025
Viaarxiv icon

ChemBOMAS: Accelerated BO in Chemistry with LLM-Enhanced Multi-Agent System

Add code
Sep 10, 2025
Viaarxiv icon